COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization
نویسندگان
چکیده
Research related to computational modeling for machine-based understanding requires ground truth data for training, content analysis, and evaluation. In this paper, we present a multimodal video database, namely COGNIMUSE, annotated with sensory and semantic saliency, events, cross-media semantics, and emotion. The purpose of this database is manifold; it can be used for training and evaluation of event detection and summarization algorithms, for classification and recognition of audio-visual and cross-media events, as well as for emotion tracking. In order to enable comparisons with other computational models, we propose state-of-the-art algorithms, specifically a unified energy-based audio-visual framework and a method for text saliency computation, for the detection of perceptually salient events from videos. Additionally, a movie summarization system for the automatic production of summaries is presented. Two kinds of evaluation were performed, an objective based on the saliency annotation of the database and an extensive qualitative human evaluation of the automatically produced summaries, where we investigated what composes high-quality movie summaries, where both methods verified the appropriateness of the proposed methods. The annotation of the database and the code for the summarization system can be found at http://cognimuse.cs.ntua.gr/database.
منابع مشابه
A Novel Approach to Background Subtraction Using Visual Saliency Map
Generally human vision system searches for salient regions and movements in video scenes to lessen the search space and effort. Using visual saliency map for modelling gives important information for understanding in many applications. In this paper we present a simple method with low computation load using visual saliency map for background subtraction in video stream. The proposed technique i...
متن کاملMultimodal Visual Concept Learning with Weakly Supervised Techniques
Despite the availability of a huge amount of video data accompanied by descriptive texts, it is not always easy to exploit the information contained in natural language in order to automatically recognize video concepts. Towards this goal, in this paper we use textual cues as means of supervision, introducing two weakly supervised techniques that extend the Multiple Instance Learning (MIL) fram...
متن کاملMEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016
Emotion recognition is a significant research filed of pattern recog‐ nition and artificial intelligence. The Multimodal Emotion Recognition Challenge (MEC) is a part of the 2016 Chinese Conference on Pattern Recognition (CCPR). The goal of this competition is to compare multimedia processing and machine learning methods for multimodal emotion recognition. The challenge also aims to provide a c...
متن کاملA Comparative Study of Gender and Age Classification in Speech Signals
Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...
متن کاملThe PlayMancer Database: A Multimodal Affect Database in Support of Research and Development Activities in Serious Game Environment
The present paper reports on a recent effort that resulted in the establishment of a unique multimodal affect database, referred to as the PlayMancer database. This database was created in support of the research and development activities, taking place within the PlayMancer project, which aim at the development of a serious game environment in support of treatment of patients with behavioural ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- EURASIP J. Image and Video Processing
دوره 2017 شماره
صفحات -
تاریخ انتشار 2017